312 research outputs found

    Periostin shows increased evolutionary plasticity in its alternatively spliced region

    Get PDF
    Background Periostin (POSTN) is a secreted extracellular matrix protein of poorly defined function that has been related to bone and heart development as well as to cancer. In human and mouse, it is known to undergo alternative splicing in its C-terminal region, which is devoid of known protein domains. Differential expression of periostin, sometimes of specific splicing isoforms, is observed in a broad range of human cancers, including breast, pancreatic, and colon cancer. Here, we combine genomic and transcriptomic sequence data from vertebrate organisms to study the evolution of periostin and particularly of its C-terminal region. Results We found that the C-terminal part of periostin is markedly more variable among vertebrates than the rest of periostin in terms of exon count, length, and splicing pattern, which we interpret as a consequence of neofunctionalization after the split between periostin and its paralog transforming growth factor, beta-induced (TGFBI). We also defined periostin's sequential 13-amino acid repeat units - well conserved in teleost fish, but more obscure in higher vertebrates - whose secondary structure is predicted to be consecutive beta strands. We suggest that these beta strands may mediate binding interactions with other proteins through an extended beta-zipper in a manner similar to the way repeat units in bacterial cell wall proteins have been reported to bind human fibronectin. Conclusions Our results, obtained with the help of the increasingly large collection of complete vertebrate genomes, document the evolutionary plasticity of periostin's C-terminal region, and for the first time suggest a basis for its functional role.Helmholtz Alliance on Systems Biolog

    MLTrends: Graphing MEDLINE term usage over time

    Get PDF
    The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas and use of scientific language may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors’ MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. Terms considered are individual words or quoted phrases which may be combined using Boolean operators. MLTrends can plot the number of records in MEDLINE per year whose titles or abstracts match each queried term for multiple terms simultaneously. The MEDLINE database is stored and indexed on the MLTrends server allowing queries to be completed and graphs generated in less than one second. Queries may be performed on all titles and/or abstracts in MEDLINE and can include stop words. The resulting graphs may be normalized by total publications or words per year to facilitate term usage comparison between years. This makes MLTrends a powerful tool for rapid evaluation of the evolution of biomedical research and language in a graphical way. MLTrends may be used at: http://www.ogic.ca/mltrend

    K2D2: Estimation of protein secondary structure from circular dichroism spectra

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Circular dichroism spectroscopy is a widely used technique to analyze the secondary structure of proteins in solution. Predictive methods use the circular dichroism spectra from proteins of known tertiary structure to assess the secondary structure contents of a protein with unknown structure given its circular dichroism spectrum.</p> <p>Results</p> <p>We developed K2D2, a method with an associated web server to estimate protein secondary structure from circular dichroism spectra. The method uses a self-organized map of spectra from proteins with known structure to deduce a map of protein secondary structure that is used to do the predictions.</p> <p>Conclusion</p> <p>The K2D2 server is publicly accessible at <url>http://www.ogic.ca/projects/k2d2/</url>. It accepts as input a circular dichroism spectrum and outputs the estimated secondary structure content (alpha-helix and beta-strand) of the corresponding protein, as well as an estimated measure of error.</p

    Pseudogenes as an alternative source of natural antisense transcripts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Naturally occurring antisense transcripts (NATs) are non-coding RNAs that may regulate the activity of sense transcripts to which they bind because of complementarity. NATs that are not located in the gene they regulate (trans-NATs) have better chances to evolve than cis-NATs, which is evident when the sense strand of the cis-NAT is part of a protein coding gene. However, the generation of a trans-NAT requires the formation of a relatively large region of complementarity to the gene it regulates.</p> <p>Results</p> <p>Pseudogene formation may be one evolutionary mechanism that generates trans-NATs to the parental gene. For example, this could occur if the parental gene is regulated by a cis-NAT that is copied as a trans-NAT in the pseudogene. To support this we identified human pseudogenes with a trans-NAT to the parental gene in their antisense strand by analysis of the database of expressed sequence tags (ESTs). We found that the mutations that appeared in these trans-NATs after the pseudogene formation do not show the flat distribution that would be expected in a non functional transcript. Instead, we found higher similarity to the parental gene in a region nearby the 3' end of the trans-NATs.</p> <p>Conclusions</p> <p>Our results do not imply a functional relation of the trans-NAT arising from pseudogenes over their respective parental genes but add evidence for it and stress the importance of duplication mechanisms of genetic material in the generation of non-coding RNAs. We also provide a plausible explanation for the large transcripts that can be found in the antisense strand of some pseudogenes.</p

    Identification of novel stem cell markers using gap analysis of gene expression data

    Get PDF
    A method for the detection of marker genes in large heterogeneous collections of gene expression data is described and applied to DNA microarray data generated from 83 mouse stem cell-related samples

    Comparison of inter- and intraspecies variation in humans and fruit flies

    Get PDF
    AbstractVariation is essential to species survival and adaptation during evolution. This variation is conferred by the imperfection of biochemical processes, such as mutations and alterations in DNA sequences, and can also be seen within genomes through processes such as the generation of antibodies. Recent sequencing projects have produced multiple versions of the genomes of humans and fruit flies (Drosophila melanogaster). These give us a chance to study how individual gene sequences vary within and between species. Here we arranged human and fly genes in orthologous pairs and compared such within-species variability with their degree of conservation between flies and humans. We observed that a significant number of proteins associated with mRNA translation are highly conserved between species and yet are highly variable within each species. The fact that we observe this in two species whose lineages separated more than 700million years ago suggests that this is the result of a very ancient process. We hypothesize that this effect might be attributed to a positive selection for variability of virus-interacting proteins that confers a general resistance to viral hijacking of the mRNA translation machinery within populations. Our analysis points to this and to other processes resulting in positive selection for gene variation

    Armadillo Motifs Involved in Vesicular Transport

    Get PDF
    Armadillo (ARM) repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family

    Linking genes to diseases: it's all in the data

    Get PDF
    Genome-wide association analyses on large patient cohorts are generating large sets of candidate disease genes. This is coupled with the availability of ever-increasing genomic databases and a rapidly expanding repository of biomedical literature. Computational approaches to disease-gene association attempt to harness these data sources to identify the most likely disease gene candidates for further empirical analysis by translational researchers, resulting in efficient identification of genes of diagnostic, prognostic and therapeutic value. Existing computational methods analyze gene structure and sequence, functional annotation of candidate genes, characteristics of known disease genes, gene regulatory networks, protein-protein interactions, data from animal models and disease phenotype. To date, a few studies have successfully applied computational analysis of clinical phenotype data for specific diseases and shown genetic associations. In the near future, computational strategies will be facilitated by improved integration of clinical and computational research, and by increased availability of clinical phenotype data in a format accessible to computational approaches

    Taxonomic colouring of phylogenetic trees of protein sequences

    Get PDF
    BACKGROUND: Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common. RESULTS: We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree. CONCLUSION: PhyloView can be used at . A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address
    corecore